Complex, Corpus-Driven, Syntactic Features for Word Sense Disambiguation

نویسندگان

  • Ari Chanen
  • Jon Patrick
چکیده

Although syntactic features offer more specific information about the context surrounding a target word in a Word Sense Disambiguation (WSD) task, in general, they have not distinguished themselves much above positional features such as bag-of-words. In this paper we offer two methods for increasing the recall rate when using syntactic features on the WSD task by: 1) using an algorithm for discovering in the corpus every possible syntactic feature involving a target word, and 2) using wildcards in place of the lemmas in the templates of the syntactic features. In the best experimental results on the SENSEVAL-2 data we achieved an Fmeasure of 53.1% which is well above the mean F-measure performance of official SENSEVAL-2 entries, of 44.2%. These results are encouraging considering that only one kind of feature is used and only a simple Support Vector Machine (SVM) running with the defaults is used for the machine learning.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

رفع ابهام معنایی واژگان مبهم فارسی با مدل موضوعی LDA

Word sense disambiguation is the task of identifying the correct sense for the word in a given context among a finite set of possible sense. In this paper a model for farsi word sense disambiguation is presented. The model use two group of features: first, all word and stop words around target word and topic models as second features. We extract topics from a farsi corpus with Latent Dirichlet ...

متن کامل

Inducing Sense-Discriminating Context Patterns from Sense-Tagged Corpora

Traditionally, context features used in word sense disambiguation are based on collocation statistics and use only minimal syntactic and semantic information. Corpus Pattern Analysis is a technique for producing knowledge-rich context features that capture sense distinctions. It involves (1) identifying sense-carrying context patterns and (2) using the derived context features to discriminate b...

متن کامل

Knowing a word by the company it keeps: Using Local Information in a Maximum Entropy Model for Word Sense Disambiguation

Word sense disambiguation (WSD) is a key problem in computational linguistics, with applications in areas such as machine translation and information retrieval. This paper describes a corpus-based method for word sense disambiguation which uses a versatile maximum entropy technique on simple local lexical features and a rich description of the syntactic context of a word to distinguish between ...

متن کامل

Evaluation of Linguistic Features for Word Sense Disambiguation with Self-Organized Document Maps

Word sense disambiguation automatically determines the appropriate senses of a word in context. We have previously shown that self-organized document maps have properties similar to a large-scale semantic structure that is useful for word sense disambiguation. This work evaluates the impact of different linguistic features on self-organized document maps for word sense disambiguation. The featu...

متن کامل

Stochastic HPSG Parse Disambiguation Using the Redwoods Corpus

This article details our experiments on hpsg parse disambiguation, based on the Redwoods treebank. Using existing and novel stochastic models, we evaluate the usefulness of different information sources for disambiguation – lexical, syntactic, and semantic. We perform careful comparisons of generative and discriminative models using equivalent features and show the consistent advantage of discr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004